On using high-level structured queries for integrating deep-web information sources
نویسندگان
چکیده
The actual value of the Deep Web comes from integrating the data its applications provide. Such applications offer human-oriented search forms as their entry points, and there exists a number of tools that are used to fill them in and retrieve the resulting pages programmatically. Solution that rely on these tools are usually costly, which motivated a number of researchers to work on virtual integration, also known as metasearch. Virtual integration abstracts away from actual search forms by providing a unified search form, i.e., a programmer fills it in and the virtual integration system translates it into the application search forms. We argue that virtual integration costs might be reduced further if another abstraction level is provided by issuing structured queries in high-level languages such as SQL, XQuery or SPARQL; this helps abstract away from search forms. As far as we know, there is not a proposal in the literature that addresses this problem. In this paper, we propose a reference framework called IntegraWeb to solve the problems of using high-level structured queries to perform deep-web data integration. Furthermore, we provide a comprehensive report on existing proposals from the database integration and the Deep Web research fields, which can be used in combination to address our problem within the previous reference framework.
منابع مشابه
Querying Structured Information Sources Over the Web
To provide access to distributed and heterogeneous sources, information integration systems have traditionally relied on the availability of a mediated schema, along with mappings between this schema and the schema of the source schemas. Queries posed to the mediated schema are reformulated in terms of the source schemas. On the Web, where sources are plentiful, autonomous and extremely volatil...
متن کاملA New Method for Improving Computational Cost of Open Information Extraction Systems Using Log-Linear Model
Information extraction (IE) is a process of automatically providing a structured representation from an unstructured or semi-structured text. It is a long-standing challenge in natural language processing (NLP) which has been intensified by the increased volume of information and heterogeneity, and non-structured form of it. One of the core information extraction tasks is relation extraction wh...
متن کاملA Semantic Approach to Integrating XML and Structured Data
XML is fast becoming the standard for information exchange on the Internet As such information expressed in XML will need to be integrated with existing information systems which are mostly based on structured data models such as relational object oriented or ob ject relational data models This paper shows how our previous framework for integrating heterogeneous structured data sources can also...
متن کاملA Semantic Approach to IntegratingXML and Structured Data Sources CAiSE 01 ID 29
XML is fast becoming the standard for information exchange on the Internet. As such, information expressed in XML will need to be integrated with existing information systems, which are mostly based on structured data models such as relational, object-oriented or ob-ject/relational data models. This paper shows how our previous framework for integrating heterogeneous structured data sources can...
متن کاملDeep Web Data Extraction Based on URL and Domain Classification
1 ISACA JOURNAL VOLUME 4, 2015 The rapid development of computer and networking technologies has increased the popularity of the web, which has led to the presence of more and more information on the web. However, the explosive increase of information online leads to some search problems—specifically search engines usually return too many unrelated results on a given query. Deep web is content ...
متن کامل